Towards Average Case Analysis of Itemset Mining
نویسندگان
چکیده
We perform a statistical analysis and describe the asymptotic behavior of the frequency and size distribution of δoccurrent, minimal δ-occurrent, and maximal δ-occurrent itemsets occurring in random datasets across the entire spectrum of δ. We also describe the probability distribution of the support of an n-element itemset in a random dataset. We find that for small values of δ relative to number of transactions the size distribution of δ-occurrent itemsets and maximal δ-occurrent itemsets can be approximated by the binomial distributions b(L, 1 1+2δ ) and b(L, 1 2δ ), respectively, where L is inventory size. The ratio of minimal δ-occurrent and maximal δ-occurrent itemsets to the total number of δ-occurrent itemsets is low for small values of δ and rapidly approaches 1 as δ approaches the number of transactions. We also prove that the probability distribution of the support of an n-element itemset in a random k-transaction dataset is binomial of type b(k, 1 2n ).
منابع مشابه
A New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملWeighted Itemset Mining from Bigdata using Hadoop
Data items have been extracted using an empirical data mining technique called frequent itemset mining. In majority of theapplication contexts items are enriched with weights. Pushing an item weights into the itemset extraction process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient weighteditemset mining algorithms a...
متن کاملA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
Data Mining and knowledge discovery is one of the important areas. In this paper we are presenting a survey on various methods for frequent pattern mining. From the past decade, frequent pattern mining plays a very important role but it does not consider the weight factor or value of the items. The very first and basic technique to find the correlation of data is Association Rule Mining. In ARM...
متن کاملAnalysis of Frequent Item set Mining on Variant Datasets
Association rule mining is the process of discovering relationships among the data items in large database. It is one of the most important problems in the field of data mining. Finding frequent itemsets is one of the most computationally expensive tasks in association rule mining. The classical frequent itemset mining approaches mine the frequent itemsets from the database where presence of an...
متن کاملAccelerating Closed Frequent Itemset Mining by Elimination of Null Transactions
The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007